Stopping criteria for, and strong convergence of, stochastic gradient descent on Bottou-Curtis-Nocedal functions
نویسندگان
چکیده
Stopping criteria for Stochastic Gradient Descent (SGD) methods play important roles from enabling adaptive step size schemes to providing rigor downstream analyses such as asymptotic inference. Unfortunately, current stopping SGD are often heuristics that rely on normality results or convergence stationary distributions, which may fail exist nonconvex functions and, thereby, limit the applicability of criteria. To address this issue, in work, we rigorously develop two can be applied a broad class functions, term Bottou-Curtis-Nocedal functions. Moreover, prerequisite developing these criteria, prove gradient function evaluated at SGD’s iterates converges strongly zero addresses an open question literature. As result our developed used new bolster other
منابع مشابه
Convergence of Stochastic Gradient Descent for PCA
We consider the problem of principal component analysis (PCA) in a streaming stochastic setting, where our goal is to find a direction of approximate maximal variance, based on a stream of i.i.d. data points in R. A simple and computationally cheap algorithm for this is stochastic gradient descent (SGD), which incrementally updates its estimate based on each new data point. However, due to the ...
متن کاملConvergence Analysis of Gradient Descent Stochastic Algorithms
This paper proves convergence of a sample-path based stochastic gradient-descent algorithm for optimizing expected-value performance measures in discrete event systems. The algorithm uses increasing precision at successive iterations, and it moves against the direction of a generalized gradient of the computed sample performance function. Two convergence results are established: one, for the ca...
متن کاملQuantized Stochastic Gradient Descent: Communication versus Convergence
Parallel implementations of stochastic gradient descent (SGD) have received signif1 icant research attention, thanks to excellent scalability properties of this algorithm, 2 and to its efficiency in the context of training deep neural networks. A fundamental 3 barrier for parallelizing large-scale SGD is the fact that the cost of communicat4 ing the gradient updates between nodes can be very la...
متن کاملConvergence diagnostics for stochastic gradient descent with constant step size
Iterative procedures in stochastic optimization are typically comprised of a transient phase and a stationary phase. During the transient phase the procedure converges towards a region of interest, and during the stationary phase the procedure oscillates in a convergence region, commonly around a single point. In this paper, we develop a statistical diagnostic test to detect such phase transiti...
متن کاملOn Early Stopping in Gradient Descent Learning
In this paper, we study a family of gradient descent algorithms to approximate the regression function from Reproducing Kernel Hilbert Spaces (RKHSs), the family being characterized by a polynomial decreasing rate of step sizes (or learning rate). By solving a bias-variance trade-off we obtain an early stopping rule and some probabilistic upper bounds for the convergence of the algorithms. Thes...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Mathematical Programming
سال: 2021
ISSN: ['0025-5610', '1436-4646']
DOI: https://doi.org/10.1007/s10107-021-01710-6